Resources for Information Extraction from Polish texts

نویسندگان

  • Agnieszka Mykowiecka
  • Anna Kupść
  • Małgorzata Marciniak
  • Jakub Piskorski
چکیده

The paper presents a collection of resources developed for Information Extraction (IE) from Polish texts. In particular, we mention two IE platforms adapted to Polish and several IE applications built on top of one of them: named entity recognition, creation of terminology lexicons, and data extraction from medical texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Terminology extraction from medical texts in Polish

BACKGROUND Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need informat...

متن کامل

Automatic Processing of Diabetic Patients' Hospital Documentation

The paper presents a rule-based information extraction (IE) system for Polish medical texts. We select the most important information from diabetic patients’ records. Most data being processed are free-form texts, only a part is in table form. The work has three goals: to test classical IE methods on texts in Polish, to create relational database containing the extracted data, and to prepare an...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Dependency-based Extraction of Entity-relationship Triples from Polish Open-domain Texts

We present a prototype system for extracting arbitrary relations between named entities from open-domain texts in Polish based on DEBORA – a dependency-based approach to the problem. The presented method is designed for the purpose of the conducted experiment and is adapted to morpho-syntactic properties of Polish, e.g. free word order, high degree of morphological marking. Our preliminary resu...

متن کامل

DEBORA: Dependency-Based Method for Extracting Entity-Relationship Triples from Open-Domain Texts in Polish

We present DEBORA – a dependency-based approach to the problem of extraction of arbitrary relations between named entities from open-domain texts in Polish. The presented method is designed for the purpose of the conducted experiment and is adapted to morpho-syntactic properties of Polish, e.g. free word order, high degree of morphological marking. Our preliminary results show that the method i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007